Identi cation of Case, Digits and Special Symbols Using a Context Window
نویسندگان
چکیده
We present strategies and results for identifying the symbol type of every character in a text document. Assuming reasonable word and character segmentation for shape clustering, we designed several type recognition methods that depend on cluster n-grams, characteristics of neighbors, and within-word context. On an ASCII test corpus of 925 articles, these methods represent a substantial improvement over default assignment of all characters to lower case.
منابع مشابه
SECURING INTERPRETABILITY OF FUZZY MODELS FOR MODELING NONLINEAR MIMO SYSTEMS USING A HYBRID OF EVOLUTIONARY ALGORITHMS
In this study, a Multi-Objective Genetic Algorithm (MOGA) is utilized to extract interpretable and compact fuzzy rule bases for modeling nonlinear Multi-input Multi-output (MIMO) systems. In the process of non- linear system identi cation, structure selection, parameter estimation, model performance and model validation are important objectives. Furthermore, se- curing low-level and high-level ...
متن کاملNonlinear system identification using higher order statistics
A general formula is given for the conditional mean in terms of higher order statistics. Using this formula, a general scheme for nonlinear system identi cation is introduced including a broad range of nonlinearities which depends on the probability density function of the input. As a special case of that general scheme, the polynomial system identi cation problem is treated. It is shown that o...
متن کاملPii: S0165-1684(01)00096-2
A subspace based blind channel identi&cation algorithm using only the fact that the received signal can be oversampled is proposed. No direct use is made in this algorithm of either the statistics of the input sequence or even of the fact that the symbols are from a &nite set and therefore this algorithm can be used to identify even channels in which arbitrary symbols are sent. Using this algor...
متن کاملOpen-loop worst-case identi"cation of nonSchur plants
This paper presents an LMI based algorithm for deterministic worst-case identi"cation of nonSchur plants in an open-loop setting. Contrary to other approaches dealing with this problem, the proposed technique does not require prior knowledge of a stabilizing controller. The main result of the paper shows that, as the information is completed, the identi"ed model converges, in the ‘2-induced top...
متن کاملMaximum-likelihood blind FIR multi-channel estimation with Gaussian prior for the symbols
We present two approaches to stochastic Maximum Likelihood identi cation of multiple FIR channels, where the input symbols are assumed Gaussian and the channel deterministic. These methods allow semi-blind identi cation, as they accommodate a priori knowledge in the form of a (short) training sequence and appears to be more relevant in practice than purely blind techniques. The two approaches a...
متن کامل